112
9
Probability and Likelihood
In observing the natural world, one encounters “deterministic” events, character-
ized by rather clear relationships between the quantities measured compared with
the experimental uncertainties, and more uncertain events with statistical outcomes
(such as coin tossing or Mendelian gene segregation). The latter raise the general
problem of how to assess the relative merits of alternative hypotheses in the light
of the observed data. Statistics concerns itself with tests of significance and with
estimation (i.e., seeking acceptable values for the parameters of the distributions
specified by the hypotheses).
The method of support proposes that
StartLayout 1st Row posterior support equals prior support plus experimental support EndLayoutposterior support = prior support + experimental support
and
StartLayout 1st Row information gained equals log StartFraction posterior probability Over prior probability EndFraction period EndLayoutinformation gained = log posterior probability
prior probability
.
Two rival approaches to estimation have arisen: the theory of inverse probabil-
ity (due to Laplace), in which the probabilities of causes (i.e., the hypotheses) are
deduced from the frequencies of events, and the method of likelihood (due to Fisher).
In the theory of inverse probability, these probabilities are interpreted as quantitative
and absolute measures of belief. Although it still has its adherents, the system of
inference based on inverse probability suffers from the weakness of supposing that
hypotheses are selected from a continuum of infinitely many hypotheses. The prior
probabilities have to be invented; for example, by imagining a chance setup, in which
case the model is a private one and violates the principle of public demonstrability.
Alternatively, one can apply Laplace’s “Principle of Insufficient Reason”, according
to which each hypothesis is given the same probability if there are no grounds to
believe otherwise. Conceptually, that viewpoint is rather hard to accept. Moreover, if
there are infinitely many equiprobable hypotheses, then each one has an infinitesimal
probability of being correct.
Bayes’ theorem (9.18) may be applied to the weighting of hypotheses if and
only if the model adopted includes a chance setup for the generation of hypotheses
with specific prior probabilities. Without that, the method becomes one of inverse
probability. Equation (9.18) is interpreted as equating the posterior probability of the
hypothesisupper E Subscript kEk (after having acquired dataupper AA) to our prior estimate of the correctness
ofupper E Subscript kEk (i.e., before any data were acquired),upper P left brace upper E Subscript k Baseline right braceP{Ek}, multiplied by the prior probability
of obtaining the data given the hypothesis (i.e., the likelihood; see below), the product
being normalized by dividing by the sum over all hypotheses.
A fundamental critique of Bayesian methods is that the Bayes–Laplace approach
regards hypotheses as being drawn at random from a population of hypotheses, a
certain proportion of which is true. “Bayesians” regard it as a strength that they
can include prior knowledge, or rather prior states of belief, in the estimation of the